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Abstract 

This paper considers the multiuser multiple-input multiple-output (MIMO) broadcast channel. We 
consider the case where the multiple transmit antennas are used to deliver independent data streams to 
multiple users via vector perturbation. We derive expressions for the sum rate in terms of the average 
energy of the precoded vector, and use this to derive a high signal-to-noise ratio (SNR) closed-form 
upper bound, which we show to be tight via simulation. We also propose a modification to vector 
perturbation where different rates can be allocated to different users. We conclude that for vector 
perturbation precoding most of the sum rate gains can be achieved by reducing the rate allocation 
problem to the user selection problem. We then propose a low-complexity user selection algorithm 
that attempts to maximize the high-SNR sum rate upper bound. Simulations show that the algorithm 
outperforms other user selection algorithms of similar complexity. 
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I. Introduction 

Multiuser multiple-input multiple-output (MIMO) technologies may be employed by cellular 
base stations and wireless LAN access points to transmit messages to K multiple non-collocated 
users without resorting to increasing bandwidth or transmit power. By exploiting the richness of 
multipath environments, such systems are able to achieve downlink data rates that scale linearly 
with the number of antennas at the transmitter, as is possible with simpler point-to-point MIMO 
communications, e.g. [1], [2]. 

An optimal sum rate achieving transmission method for the multiuser MIMO downlink is 
dirty -paper coding (DPC) [1,3]. As this scheme requires computationally infeasible random 
coding and binning operations, it remains a theoretical construction. Linear precoders such 
as channel inversion [4] and zero-forcing beamforming [5] can be used for lower complexity 
implementations. 

A promising practical transmission method with better performance than linear precoders is 
vector perturbation (VP) precoding [6]. With VP precoding, the data vector to be transmitted is 
constrained to lie within a 2i^-dimensional hypercube of side length one, and is modified by the 
addition of a perturbation vector consisting of complex integers, before being passed through a 
channel inverting linear precoder. The addition of the perturbation vector significantly reduces 
the required transmit power, and can be removed completely by independent modulo operations 
at each receiver. The choice of the perturbation vector is an instance of the well-studied NP- 
hard problem of finding the closest lattice point, whereas here the lattice is determined by the 
channel. A common method to perform the search is the sphere-decoding algorithm [7-9], as 
well as suboptimal lattice reduction methods. 

Due to the perturbation process, the sum rate performance of vector perturbation systems is 
more difficult to analyze than linear precoding systems, and exact expressions for performance 
measures remain an outstanding problem. This is primarily due to the fact that the performance is 
a function of the average power of the precoded signal, £ se , as this determines the effective noise 
power at the output of each user's demodulator. It is hard to calculate £ se since it is determined 
by a closest lattice point search. Closed-form representations of £ se are not available, however 
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some useful closed-form bounds have been derived in [10]. In [6] an expression that gave 
insight into the choice of perturbation vector is derived, but still required numerical simulation 
to evaluate £ se . A statistical physics based approach was used in [11,12] to derive £ se in the 
limit as N T) K — > oo, where N T is the number of antennas at the BS. The approach in [11, 
12] requires a number of assumptions, and also the results are in terms of a fixed-point integral 
equation, which requires numerical evaluation. Another related result was given in [13], where 
it was shown that sub-optimal lattice reduction based sphere-encoding [14] achieves the full- 
diversity order. Additionally, expressions for bit error rates, assuming £ se is known, have been 
given in [15]. 

To the authors' knowledge the sum rate of vector perturbation systems has not been analyzed. 
Other practical issues also remain open, such as how to select a subset of users from a set of 
available users, or how to allocate different rates to users in order to maximize the sum rate. 
Various user selection and rate allocation algorithms have been suggested for linear precoders 
such as zero-forcing [5] and zero-forcing dirty paper coding [16] but not for vector perturbation 
systems. These three problems are the subject of this work. 

In this paper, we provide an expression for the sum rate of vector perturbation systems based on 
the assumptions that £ se is known exactly and the data to be transmitted is uniformly distributed. 
Then we show that in high-SNR regime, the effect of modulo operation diminishes hence it has 
no bearing on the sum rate performance of the system. Using this high-SNR property, we derive 
a lower bound to this sum rate, as well as an asymptotic closed-form high-SNR upper bound. 
Simulation results suggest that this upper bound is tight for transmit SNRs greater than 10 dB. 

We then propose a modification to vector perturbation precoding so that different rates may be 
allocated to different users. We examine the problem of optimizing the rate allocation and propose 
a sub-optimal rate allocation algorithm, which uses the simple £ se approximation derived in [15]. 
We see that the rate allocation improves the performance in the low-SNR regime. However, for 
the vector perturbation precoding system the sum rate may be well approximated by an on-off 
function. We numerically determine that this on-off function has mutual information of at most 
0.2992 bits less than the actual mutual information. Using this knowledge, we propose that the 
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rate allocation problem can be reduced to one of user selection. 

Therefore, we next turn our attention to the practical user selection algorithms. We propose 
a low-complexity algorithm for user selection for the vector perturbation precoding systems. 
Specifically, we propose a greedy algorithm which chooses users successively in order to maxi- 
mize the new sum rate upper bound at high SNR. We show that the selection criterion becomes 
equivalent to the selection criterion used in algorithms proposed in [5, 16], but differs in the user 
shedding criterion. We provide simulation results that show that the sum rate of our system is 
very close to that achieved by an exhaustive search through all possible combinations of users, 
and our proposed algorithm outperforms other low-complexity algorithms [5,16]. Simulation 
results also show that the user selection outperforms our proposed rate allocation algorithm, and 
that the rate allocation algorithm provides negligible improvement if used in conjunction with 
user selection. 

II. System Model 

We now detail the system model. We use (•)' to denote matrix transpose, (•)* to denote matrix 
conjugate transpose and Vol(-) to denote the Jordan-measurable volume [17] of a region. We use 
(•) + to denote Moore-Penrose pseudoinverse [18] and also denote the set of Gaussian (complex) 
integers as Z[j]. We use [_•] to denote the element- wise rounding to the nearest Gaussian integer. 

We consider the downlink of a narrowband multi-user MIMO system with N T transmit antennas 
broadcasting to K ^ Nj spatially dispersed users. Each user has a single receive antenna. The 
users are selected from a set of U available users. Each channel realization H G C KxNt consists 
of elements h k:t € C that represents the channel between the fc th user and t th transmit antenna. 

Given the transmitted vector x = [x x . . . x Nj ]' e C Ntx1 , the received symbol at user k is given 
by 

y k = h k x + n k , (1) 

where n k is additive white Gaussian noise with distribution of CA/"(0, 1) and h k = [h k ,i ■ ■ ■ h kyNT \. 
The received symbols can be combined as y = [y± . . . yx]' G C Kxl to give 

y = Hx + n, (2) 
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where n = [rii . . . %]'. The transmitted vector x is a modified "perturbed" and "precoded" form 
of the data vector a = [ai . . . clk] e CUBE where CUBE is the K -ary Cartesian product of 
the region 

CUBE = {a : |Re{a}|<0.5, |Im {a}\ < 0.5}. 

Clearly, Vo^CUBE^) = 1. To generate x, the data vector a is first perturbed and then precoded 
to create the sphere-encoded signal vector, s, according to 

s = F(a + p), (3) 

where we set F = H + to be a precoding matrix and p is the Gaussian (complex) integer-valued 
perturbation vector given by 

p = argmin ||F(a + q)|| 2 . (4) 

cieZ{j] K 

Now, choosing p in (4) is a well-studied NP-hard problem of finding the closest lattice point. 
We assume that the algorithm used to solve (4) gives the optimal solution for the purposes 
of analytical tractability. An optimal approach will have complexity exponential in K e.g. the 
sphere-decoding algorithm of [7]. Some suboptimal methods of polynomial complexity may be 
employed for the case when K is increasing, such as the lattice reduction based approach of 
[14], and the singular value decomposition based approach of [19]. For our simulations, we used 
the sphere decoding algorithm proposed in [20]. 

For analytical purposes we will consider uniformly distributed inputs where a is an i.i.d. 
random variable with probability distribution function p(a) = X CUBE if(a) where x(') is the 
characteristic (indicator) function. 

The final step in generating x is to scale s as follows: 

where P is the transmit signal to noise ratio (SNR), and 




is the expected power of the sphere-encoded vector s for a channel instance (packet) H, where 
the expectation is taken over a. That is, the expected power required to transmit each packet is 



constant. Hence the receiver only needs to know £ se , which is a data independent quantity, in 
order to decode the received signal correctly 1 . 
At the kth user's receiver, the data is recovered using a modulo demodulator [6] 

a k = ^8 st {¥)/Py k = a k + p k + y/6 se (F)/Pn k 

L JmodCUBE x L J mod CUBE 

= [d k + %] mo dCUBE K > ( 7 ) 

where, a k ,p k , and n k are the kth element of the vectors a, p, and n respectively and r\ k = 
\/ '£ S e(F) / ' Pn k is the effective noise for user k. Therefore Var^} = £ se (F)/P. The function 
MmodCUBE^ denotes a modulo operation which is defined as [•] mod cuBE A ' = [•] — L-l- This 
operation finds a point inside the region CUBE^ if the point lies outside the region CUBE X . 
The modulo operation is applied to the real and imaginary parts independently. 

III. Sum Rate of Vector Perturbation Precoding 

In this section, we derive the sum rate of the VP precoding system using uniformly distributed 
inputs given that the value of £ se (F) is known. We derive a lower bound to this sum rate which 
is also approached asymptotically as the transmit SNR P — > oo. We then derive an upper bound 
to the sum rate using a lower bound to £ se (F) that we recently derived in [10]. 

First, we derive an expression for the sum rate of the VP precoding system in terms of £ se (F). 
Define I(a k ; a k \H, F) as the mutual information between a k and a k given channel matrix H and 
precoding matrix F. 

Theorem 1: The sum rate i? V p of an N T x K vector perturbation system with uniformly 
distributed inputs is 

K 

i?vp(H,F)^J]7(a fe ;a fe |H,F) 



fc=i 



where 



1 r- 00 

- \ + /_: e 



27T7 



2 s = — oo 



oo 

log e 2 t 

t=— oo 



d£. (9) 



'in a practical system, the transmitter would calculate the packet power and then scales the packet to satisfy the power 
constraint. If the packet is long enough, the empirical and expected values of £ sc will be close. 



Proof: See Appendix I. ■ 
We now discuss this result. We see that £ se (F) and the function ^(7) are important terms in 
order to understand the sum rate for the vector perturbation system, hence we go in detail to 
examine these two terms one by one. 

With regards to £ se (F), we note that no exact analytical results have yet been obtained. Some 
partially numerical results concerning the value of £ se (F) were presented in [6]. In [1 1, 12], using 
replica method of statistical physics an asymptotic result for £ se (F) was derived as a coupled 
fixed-point representation. However, for the case of uniformly distributed inputs, we derived a 
lower bound in [10], which was shown to be a good approximation for most input distributions. 
We will subsequently use the result of [10] to derive an asymptotic upper bound on the sum 
rate. 

Next, we turn to the term ^(7), where 7 = £ se (F)/(2P). The term f2( 7 ) captures the effect 
of the modulo operation on the Gaussian noise. We see that, from (34) in Appendix I, 

fi(7) = ilog(27re 7 )- J ff(0, (10) 
where, f = Re {[%] modCUBE }. As P -> 00, it follows that 

lim #(0 = ^log(27re 7 ) 

which concurs with the intuition that the distribution of £ approaches A/"(0, £), as the noise 
variance decreases. Applying this to (10) gives 

limfi( 7 )=0. (11) 

Moreover, since H(£) < \ log(27re7) and as \ log(27re7) is the maximum entropy for any random 
variable with variance 7, therefore ^(7) > 0. As P — > 0, the distribution of £ approaches a 
uniform distribution over the interval [— |, |]. It follows that \im P ^ H(£) = 0, and thus 

Urnf2( 7 ) = -log27re7. 

In summary, Q('y) is an increasing function in 7 (and decreasing in P) with range (0, \ log27re7) 
for 7 > 0. In the high-SNR regime, 0(7) will be small, as the effect of the modulo operation 
diminishes, and therefore negligible when it comes to determining the sum rate. 



We now use Theorem 1 to derive the following useful bounds and asymptotic values of the 
sum rate. By noting that ^(7) > 0, and approaches as P — > 00, we have the following lower 
bound and asymptotic result. 

Corollary 1: The sum rate i? V p of an Nt x K vector perturbation system with uniformly 
distributed inputs satisfies the lower bound 

Pvp.lb = K log — - K log — — — (12) 

which is approached as P — > 00. 

Additionally, we also have the following asymptotic upper bound which we will use as a basis 
for the user selection algorithm in Section V. 

Corollary 2: As P — > 00, the sum rate i? V p of an N T x if vector perturbation system, 
employing uniformly distributed inputs and precoding matrix F = H + has the following the 
upper bound 

P T(K+l)^e 
lim i? VP < K log - + log det(W) - K log ViFTTS ' 
P^oo A ^A + 1 J 

where W = HH+ and T(-) denotes the gamma function. 

Proof: First, recall from our discussion of (8) in Theorem 1 that ^(7) — > as P — > 00. 
Then, we substitute the lower bound on £ se (F) from [10], namely 

MF) > ^lb(F) ^ KV ^^ 1IK det(FtF) 1 ^ (13) 

(A + 1)7T 

into (8). By noting that F = H + and therefore F+F = W _1 , completes the result. ■ 

IV. Rate Allocation for Vector Perturbation Precoding 

In this section we will extend the system model by taking into account the rate allocation in 
an attempt to further optimize the sum rates. Using a rate allocation matrix A, we derive an 
expression for sum rate and then discuss the performance gain yielded by the rate allocation. 

We propose to decompose the channel matrix H as 

H = DVQ, (14) 

where this decomposition in (14) is a variation of QR decomposition such that D = diag(<ii, . . . , dx), 
V is lower triangular with ones on its diagonal and Q is a unitary matrix. Then H + = Q + V + D + . 



Instead of using H + as a precoding matrix, as was the case in Sections II and III, we now set 
F = Q + V + A to be a modified precoding matrix so as to take into account the rate allocation 
using A = diag(Ai, . . . , A^) as a rate allocation matrix. Now the Gaussian (complex) integer- 
valued perturbation vector p is given by 



p = argmin || V + A(a + q 
We then scale s to generate the transmit vector x as follows: 



(15) 



SJF) 



(16) 



The received signal at the kth user is then 



Uk = \ j-dkX k {a k + Pk) + n k 

and the recovered data symbol at the output of the modulo demodulator of the A:th user is given 
by 



Vk 



modCUBE K 
= [a k + %] mo dCUBE K > 



a k + Pk + 



X(F) 



mod CUBE' 



(17) 



where r\k 



PA 



^n k is the effective noise for user k. 



Corollary 3: The sum rate -Rvp-ra of an N T x K vector perturbation system with uniformly 
distributed inputs and precoding matrix F = Q + V + A is 



K 



i?vp-R A (H, F) = J2 H^k, a k \n, F) 



k=i 

K 



E J log Mi_ log ^ + 2 ^^ F 



k=l 



K 



K 



(18) 



Proof: From (17) we see that Var {r] k } = p ff d i , hence by using this variance and following 

k k 

the steps in Theorem 1, the proof is completed. ■ 
We note that the choice of the optimal A is difficult as the rate is a function of £ se (F), which 
is an NP-hard problem to evaluate. In order to find a simple sub-optimal approach to the rate 
allocation problem, we first examine the mutual information function I(a k ; afc|£ se (F), d k ) as a 
function of \ k . In Fig. 1, we plot I(dk; afc|£ se (F), dk) as a function of A& for SNR = dB, 
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£ se (F) = 0.1 and dk — 1. We also plot a piece-wise linear approximation to afc|£ se (F), d fe ), 
namely 

J PW (a fc ;a fc |£ se (F),4) = max (o, log - log 1 = max(o,log- 7 



K isT J I ' b 7re£: se (F) J ' 

(19) 

as well as the mutual information of a Gaussian channel matched to have the same mutual 
information in the high and low SNR regimes 

/ A wG N (a fc ; a fc |£ se (F), d h ) = log ^1 + ^/^ j ■ (20) 

The piece- wise linear approximation in (19) is motivated by the fact that, as we showed in 
Section III, ^(7) approaches as P — > 00 hence the modulo vector perturbation channel in 
high SNR regime is a high SNR AWGN channel. While for low SNR, it can be seen as a zero 
mutual information channel. Also note that expressions of the logarithmic form, as in (20), are 
obtained when linear precoding schemes are used with Gaussian inputs, as the received signal 
is also Gaussian. 

We see that ipw is much tighter for the modulo vector perturbation channel than Tawgn- 

The 

maximum difference with the piece-wise approximation is at most 1 bit for the AWGN channel 
and only ~ 0.2992 bit for the modulo vector perturbation channel. Note also that the range of 
Afc where the difference is non-negligible is much less for the piecewise approximation, which 
also explains why such an approximation is of less interest for linear precoding systems. 

We propose to take advantage of the tightness of the piecewise lower bound to simplify the 
method of rate allocation. Specifically, we propose to maximize the rate allocation function 

From the above we know that the maximum difference between the actual sum rate and this 
piece-wise approximation is at most 0.2992/T ^ 0.2992iV T bits. To remove the difficulty in 
optimization imposed by the dependence on the £ se function we again use the lower bound in 
(13), assuming now that the precoding matrix has 

» det < A2 » i/K < 22 > 
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as det(V) = 1 and QQ f = 1. By inserting (22) into (21) we get 

K { P T(K+l) 1/K e 1 K 1 

i?vp,pw ^ ^ max 1 0, log - - log ( ^ + 1} + log 4 + log A* - - ^ log \\ \ . (23) 

The value of using (22) as an approximation has been examined in [15]. We now examine how 

the rate allocation proceeds from here. To simplify (23) we set 

1 K 

k=l 

and \og(\' k ) 2 = logA^ — c. Substituting this into (23) we obtain 

K 

i?vp,pw < max i ' R °> k + l°g( A 'fc) 2 }- 

k=l 

where R 0> k = log — log r ^^\) + log d\. Now, if we place the restriction that K users must 
be used then the sum rate is at most 

K K 

-Rvppw ^ y^(-Ro,fc + iog(Afc) 2 ) = Rp,k 
fc=i k=i 

P T(K + l) 1/K e 

= K\og- + log det ( W) - K log 

= -RvP.UB- 

Note that if \' k is chosen so that logciji and log(A' fe ) 2 are equal, that would imply that either all 
or none of the users are in the non-zero rate regime. This choice of X' k corresponds to standard 
vector perturbation as outlined in Section II. 

We see that by making this piece-wise linear approximation to the mutual information, and the 
use of the £ se approximation, the best sum-rate obtainable due to rate allocation is approached by 
simply selecting users so as to maximize -Rvp.ub- To summarize, as a consequence of the modulo 
vector perturbation channel for a particular user being effectively a high SNR AWGN channel 
in the high SNR regime, and a zero mutual information channel in the low SNR regime, the 
difference between an on-off assumption and the modulo vector perturbation channel (0.2992 bit) 
is much less than the difference between the on-off assumption and the AWGN assumption (^ 1 
bit, and for a much greater range of gains). Consequently, we would expect that, to approach 
the maximum sum rate it is sufficient to select the users that will maximize the high-SNR sum 
rate upper bound given by Corollary 2. Moreover, it is sufficient to use the standard channel 
inverse precoding matrix to achieve this rate. 
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V. User Selection Algorithm 



We now turn to the user selection, both as a rate allocation algorithm, and for use in scenarios 
when the number of potential users U is greater than the number of transmit antennas. We 
propose an algorithm which we refer to as greedy rate maximization (GRM) for user scheduling 
for vector perturbation precoding. GRM is a low-complexity scheme, which can be considered 
a greedy algorithm to maximize the capacity upper bound of Corollary 2. It turns out that the 
criteria for selecting users is similar to that used for zero-forcing dirty-paper coding in [16], and 
modified for zero-forcing beamforming in [5]. We discuss the differences in the algorithms, in 
terms of shedding users and terminating the user selection process. It is to be noted that our 
proposed greedy algorithm focus on maximizing the sum rate but in doing so fairness among 
the users is not guaranteed. 

The user selection algorithm we propose is as follows. Denote S as the set of users that have 
been selected, the cardinality of S is K — \S\, and U as the set of users who have not been 
selected or removed from consideration. For the selected users S we denote H(«S) as the channel 
matrix constructed from these users, and W(«S) = H( l S)H(5) t . The algorithm we propose here 
maximize the high-SNR upper bound of Corollary 2 by maximizing det(W(<S)). From (13) we 
note that maximizing det(W(«S)) is actually equivalent of minimizing £ se (F). The algorithm is 
as follows: 

1) Initialize the set of selected vectors 5 = 0, and set U to the set of all users. 

2) Calculate det(W(<S U u)) for all users u e U. Determine u max , the user that maximizes 
det(W(5U«)). 

3) Remove from U all those users such that i? V p would be reduced if they were to be added 
to S. Precisely, remove user u if 



and K > 1. (We will provide a low complexity way for calculating the left hand side of 
this equation.) 

4) If U is non-empty, add user u max to S and remove it from U, and return to step 2. 



det(W(5Uu)) 
det(W(«S)) 



< 



e(K + l) 2K+l 



(24) 



PK K (K + 2) K+l 
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5) If U is empty or K = N T , terminate the algorithm. 

We now compare the operations performed by GRM with Greedy -ZF [16] and semi-orthogonal 
user selection (SUS) [5]. First, we show that the metric det(W(5 U u)) in Step 2 above that 
determines the users to be picked, is equivalent to that used in Greedy-ZF and SUS. Thus, 
we show that Greedy-ZF and SUS algorithms can essentially be viewed as greedy determinant 
maximization algorithms. Therefore, the difference between the algorithms boils down to how 
the users are removed from U to improve the complexity. 

To show the equivalence of the choice of the next user to add to S, we note that if we append 
a user u with channel vector h u to a set S, and employ the block matrix determinant formula 
to det(W(<S Uit)) we obtain 

' r H(5)H(5)t H(5)ht 



/ 

det(W(5 U u)) = det 



^ [ h u H(«S)f h u ht 
= det(W(5))||h u (I-P(5))|| 2 , (25) 

where P(£) = H( l S)(H( t S)H( l S) t ) _1 H( l S) t is a projection matrix for the subspace spanned by 
H(<S), which we denote H(S) C C NtxNt . The matrix I — P(5) is the projection matrix for 
the nullspace of H(S). It follows from (25) that the choice of user in U that maximizes the 
determinant given H(<S), is the user with channel vector h u that has the largest component in 
the nullspace of H(S). 

It is worthwhile to note that the condition given above is same as that specified by the Greedy- 
ZF and SUS algorithms. However, the motivations behind these other algorithms are slightly 
different, as the users are chosen to maximize the individual user gains in order to maximize 
the sum rate. In GRM we attempt to maximize the sum rate by minimizing the transmit power 
scaling £ se via maximizing det(W). However, by noting this similarity, we are able to take 
advantage of the lower complexity method in [5] to calculate the component of channel vectors 
orthogonal to H(S). That is, instead of calculating h u (I — P(«S)), we calculate 

MI " P(5)) = Su = K ( I - |f% ) > (26) 
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where g s is the value of g u calculated in the previous iterations of the algorithm. Note that this 
makes g s an orthogonal set of vectors, and that each g u is also orthogonal to these vectors. 
Therefore, we propose that Step 2 of the algorithm is performed by choosing the user with the 

1 1 2 

greatest value of ||g u || , thus avoiding the calculation of determinants. 

We see that ||g u || 2 can also be used for user shedding in Step 3 of the algorithm, as ||g u || 2 = 
det(W(«S Liu))/ det(W(«S)). Note here that as K increases, ||g u || is non-increasing, and the 
right-hand side of (24) is increasing. It follows that we can remove user u from U, as it will 
always decrease the rate upper bound. As we will see in the next section, this user shedding 
reduces the complexity of the algorithm, and results in a better sum rate performance than other 
algorithms. 

Note that Greedy-ZF does not perform user shedding, while the SUS algorithm performs user 
shedding based on only keeping those vectors that are semi-orthogonal to the most recent vector 
added to S. Specifically, all users satisfying 

cos 2 6(g s , h u ) 4 |h "f 1 > a 2 (27) 
ll h «ll HgJ 

are removed, where a is a parameter in the interval [0,1]. Note that the optimal value of a for 
a specific antenna/user configuration and channel distribution/SNR can only be determined via 
simulation. This in contrast to our proposed GRM scheme, which only requires knowledge of 
P, rather than the full channel statistics. 

As demonstrated in the next section, the run-time complexity of GRM, Greedy-ZF and SUS is 
similar. Note that SUS requires further calculation of (27) as part of its user shedding calculations, 
thus making it more complex for the same size U than our proposed GRM algorithm. 

VI. Simulation Results 

In this section we present simulation results for sum rate performance of VP with and without 
user scheduling. In Figs. 2 and 3, we consider a system with N T = U = K = 4 and 8 respectively. 
We plot the exact sum rate of VP precoding given by Theorem 1 , denoted VP-exact, where 8 se 
is generated by using Monte Carlo simulations. We also plot the high SNR upper bound for VP 
which is max {0, -Rvp-ub} where, -Rvp-ub is given by Corollary 2. For comparison purpose, we 
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include the plots for DPC and ZF-WF [5]. We used 1000 independent channel realizations to 
obtain these plots. The plot shows that VP-exact is outperforming ZF-WF, although at low SNR 
ZF-WF is better due to waterfilling. We also note that the high SNR upper bound for VP is tight 
for SNRs greater than 10 dB. 

In Fig. 4, we focus on user scheduling schemes with system parameters N T = U = 8 and 
K ^ U. We plot the loss in sum rate of VP-GRM and VP-SUS compared to an exhaustive search 
for VP over all user combinations (which we denote VP-ES). Extensive simulations are used 
to obtain the optimal values of a for the VP-SUS curve, and these values are provided in the 
figure. We see that VP-GRM performs better than VP-SUS in the low to medium SNR region. 
Clearly, in this region, the GRM algorithm's sum rate based criterion is particularly effective at 
shedding users, compared with the SUS algorithm's orthogonality criterion. At high SNR, the 
two curves meet. In this region, the GRM algorithm's sum-rate based criterion is dominated by 
the factor K \og(P/K) and thus K — 7V T users will always be chosen. Since the curves are on 
top of each other, SUS must also be choosing K = N T users, by selecting its optimal value of 
a close to 1. 

In Table I, we show the average number of users being selected at various SNR levels for 
the proposed algorithm VP-GRM and compare it with VP-SUS. We use N T = U = 8 and 
K ^ U. This table demonstrate that the two algorithms indeed perform user shedding differently. 
Consequently, two algorithms have different sum rate performance with VP-GRM performing 
better than VP-SUS. 

In Table II, we analyze the complexity of two algorithms by averaging the total number of 
vector multiplications required for each algorithm. The complexity is calculated by averaging 
over 1000 independent channel realizations. It is obvious for GRM, we only require 2 vector 
multiplications in (26), while SUS requires another vector multiplication for the user shedding 
operation in (27). However, the overall relative complexities are not obvious since the algorithms 
may not shed the same number of users. The table shows that the GRM complexity is in fact 
less than that for SUS. The complexity of both algorithms increases with increasing SNR as 
they tend to shed fewer users with the increasing power levels. 
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In Fig. 5, we show the performance comparison of VP-GRM and VP-SUS algorithms when 
N T — 8 but now U ranges from 2 to 24 and K < iV T . We show the sum rate results for SNR= 
0, 5 and 10 dB. We again used optimal values of a for VP-SUS. We see that VP-GRM is 
performing better than VP-SUS for the whole range of U for SNR = and 5 dB. But for SNR = 
10 dB, VP-SUS matches the VP-GRM performance for higher values of U as both algorithms, 
as was discussed above, select users which are effectively in the high SNR regime hence K is 
close to jV T . 

In Fig. 6, we examine the rate allocation scheme proposed in Appendix II and the GRM based 
user selection algorithm. We plot the performance of the algorithms when used independently, 
and also for the case when the rate allocation is performed after the users are selected. We 
examine the scenario where Nj = U = 8. We see that both algorithms improve the sum rate 
when used independently, especially for lower SNRs. Moreover, the sum rate is barely increased 
when the rate allocation algorithm is applied after the user selection. This is expected from the 
analysis of Section IV, where we see that in order to maximize the sum rate it is more important 
to select the users, rather than allocate (non-zero) rates to the users directly. In addition, after 
the user selection, all the selected users will be operating in the high-SNR regime, and therefore 
there is little to be gained by performing an additional rate allocation. 

VII. Conclusion and Future Work 

In this work, we examined the sum rate of vector perturbation schemes, based on the as- 
sumptions of a uniformly distributed channel input and the tightness of the spherical Voronoi 
region approximation to £ se . We derived expressions in terms of the determinant of the channel 
Hermitian, and simulation results demonstrate the tightness of the bounds. 

We then proceeded to the problem of individual rate allocation, as is commonly applied to 
other multiuser schemes to optimise the sum rate. However, we discovered that the modulo 
operation at the demodulator for vector perturbation precoding implies that the channel may as 
well be turned off when the gain is too low. Therefore only channels with high gains should be 
used where the energy can be applied more efficiently. Moreover, the following choice of rate 
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allocation corresponds to standard vector perturbation precoding employing the channel inversion 
precoding matrix. Nevertheless, there may be a value in reconsidering the rate allocation problem 
with respect to scheduling fairness, different channel models, or variations of vector perturbation 
precoding. 

It follows that user selection is the most important step to maximize the sum rate, regardless 
of whether the number of users exceeds that of the number of transmit antennas. Based on 
our high-SNR upper bound, we saw that this corresponds to determinant maximization. We 
proposed a greedy algorithm for this, which is essentially the same algorithm as semi-orthogonal 
user selection proposed in the context of ZFBF [5], but with more appropriate user shedding 
criteria, resulting in a lower-complexity and better performing algorithm which does not require 
optimization over the channel statistics. Naturally, the design and analysis of limited feedback 
techniques [15,21] for the efficient collection of CSI at the transmitter with respect to the 
user selection process is required. As said before, scheduling fairness among users is another 
important issue to consider which become all more important when all users are assumed to 
have same received SNR (i.e. heterogeneous system model). A full treatment of this issue will 
be an important extension of this work in future. Also in this work, we have only considered 
single antenna users hence the impact of having multiple antenna receivers on the sum rate 
performance and scheduling complexity for vector perturbation precoding system remains an 
outstanding future work. 

Appendix I: Proof of Theorem 1 
Proof: First, note that for each k = 1, . . . , K we have 

I(a k ;a k ) = H(a k ) - H(a k \a k ). (28) 

Since a k is restricted to CUBE, it follows that H(a k ) is maximized if a k is uniformly distributed. 
This is achieved if a k is uniformly distributed. 

H(a k ) = logVol(CUBE) = 0. 
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In order to calculate H(a k \a k ), we first define few terms here. As we discussed above, a k is 
uniformly distributed where we use f(a k ) to denote the p.d.f. of a k . Now for all k, we denote 

V k = M mo dCUBE> ( 29 ) 

where the p.d.f. of v k is given by 

/K) - f(a k \a k ), (30) 

where f(d k \a k ) is the p.d.f. of a k conditioned on a k . 

Noting that f(u k ) is same for all k, and that v k is i.i.d. for the real and imaginary dimensions, 
we can define £ = Re {u k }. Now, /(£) has a modulo-Gaussian distribution given by 

/(OH ^ 2 ' (3D 



7 4 ^S. (32) 



and 

7 2P 
Now, to calculate H(d k \a k ) we have 

H(a k \a k ) = / /(a fc ) / f(a k \a k ) log f(a k \a k )da k da k 

JCUBE JCUBE 

= / f (a k \a k ) log f(a k \a k )da k , 

./CUBE 

where the second equality follows from the fact that the inner integral is the same for all 
a k G CUBE^ and that H{a k \a k ) is uniform. Using the definitions above, we write 

H(a k \a k ) = H{v k ) = 2tf(£) = 2 / /(£) log /(£)d£. (33) 

Now, using 0(7) = 1/^2^7, and inserting (31) into (33) we get 

/i 00 /oo „ \ 

2 v — v 15— s| / v — ■ v l£-*l 2 \ 

^^( 7 )e-— log <t>{l)e-—)dt 
'2 s=— 00 \t=— 00 / 

/I 00 „ 00 „ 

<j>(j)e 2 ~< log e ^ d£ 
"2 s=— 00 t=— 00 

= ilo g 27re 7 -fi(7) (34) 
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where we recall the definition of ^(7) in (9). Therefore 

K 

i? V p(H,F) = ^/(a fc ;a fc |H,F) 
k=i 

= -K log 1^21 + 2^(7) 

P 7reg se (F) _ n / 4(F) \ 
= ATlog— - AT log + 2KQI J 

■ 

which gives the theorem. 

Appendix II: A Sub-optimal Rate Allocation Scheme 

As we discussed in Section IV, exactly solving the optimization problem of finding rate 
allocation matrix A is difficult as it involve finding £ se (F) which is NP-hard. Hence, we resort 
to a simpler sub-optimal iterative algorithm for the choice of A. 

Assuming the output of each user's demodulator to be Gaussian (instead of modulo-Gaussian), 
the sum-rate for this N T x K vector perturbation system is given by 

K 

i? V P-zF = ]Tlog(l + ^), (35) 

k=i 

where 5 l = e£w) d l- 

We propose to use an iterative algorithm which tries to find rate allocation matrix A as follows: 

1) Initialize with lower bound on £ se (F) calculated by using (13) with A = I K 

2) Update A by using standard waterfilling 

A 2 fe = max{o, (c~^)}, (36) 
where the water level ( is chosen as 

iH o 'H)H' <37) 

3) Update £ se (F) with new precoding matrix F. 

4) Repeat 2) and 3) until A converges. 

We then use this A to calculate the sum-rate using Corollary 3. The algorithm is suboptimal 
because the approximation to £ se is used, the received signal is assumed to be subject to Gaussian 
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rather than modulo-Gaussian noise, and the algorithm converges to a local minimum which may 
not be the global minimum. 
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Fig. 1. Plot of mutual information (bps/Hz) versus Afc from Corollary 3, piece- wise approximation given by (19) and the 
Gaussian channel expression given by (20). SNR = dB, £ se = 0.1 and d k = 1. 
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Fig. 2. Plot of sum rate (bps/Hz) versus SNR (dB) for DPC, VP-exact, VP upper bound and zero-forcing with waterfilling 
(ZF-WF). U = K = N T = 4. 
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Fig. 3. Plot of sum rate (bps/Hz) versus SNR (dB) for DPC, VP-exact, VP upper bound and zero-forcing with waterfilling 
(ZF-WF). U = K = N T = 8. 
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Fig. 4. Plot of loss of sum rate (bps/Hz) versus SNR (dB) for VP-GRM and VP-SUS compared to exhaustive search for VP. 

N T = U = 8 and K < U. 



TABLE I 

Average number of users selected for VP-GRM and VP-SUS. N T = U = 8 and K 





SNR=0dB 


SNR=5dB 


SNR=10dB 


SNR=15dB 


SNR=20dB 


SNR=25dB 


SNR=30dB 


VP-GRM 


2.3330 


4.3480 


6.0350 


7.0220 


7.5400 


7.8370 


7.9450 


VP-SUS 


2.0570 


4.5920 


5.4160 


7.0480 


7.9480 


7.9480 


7.9850 
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TABLE II 

Average number of vector multiplications for VP-GRM and VP-SUS. N t = U = 8 and K < U. 





SNR=0dB 


SNR=10 dB 


SNR=20dB 


SNR=30dB 


VP-GRM 


27.4 


62.8 


70.8 


71.88 


VP-SUS 


34.5 


64.2 


100.8 


104.67 




Fig. 5. Plot of sum rate (bps/Hz) versus number of users for VP-GRM and VP-SUS. N T = 8 and SNR = 0, 5 and 10 dB. 
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Fig. 6. Plot of sum rate (bps/Hz) versus SNR (dB) for VP-exact, VP-exact with rate allocation from appendix II, VP with 
GRM, and VP with GRM and rate allocation from appendix II. N T = U = 8 and K < N T . 



